86 research outputs found

    Publishing perishing? Towards tomorrow's information architecture

    Get PDF
    Scientific articles are tailored to present information in human-readable aliquots. Although the Internet has revolutionized the way our society thinks about information, the traditional text-based framework of the scientific article remains largely unchanged. This format imposes sharp constraints upon the type and quantity of biological information published today. Academic journals alone cannot capture the findings of modern genome-scale inquiry. Like many other disciplines, molecular biology is a science of facts: information inherently suited to database storage. In the past decade, a proliferation of public and private databases has emerged to house genome sequence, protein structure information, functional genomics data and more; these digital repositories are now a vital component of scientific communication. The next challenge is to integrate this vast and ever-growing body of information with academic journals and other media. To truly integrate scientific information we must modernize academic publishing to exploit the power of the Internet. This means more than online access to articles, hyperlinked references and web-based supplemental data; it means making articles fully computer-readable with intelligent markup and Structured Digital Abstracts. Here, we examine the changing roles of scholarly journals and databases. We present our vision of the optimal information architecture for the biosciences, and close with tangible steps to improve our handling of scientific information today while paving the way for an expansive central index in the future

    Challenges in integrating Escherichia coli molecular biology data

    Get PDF
    One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.Portuguese FCT funded MIT-Portugal Program in Bioengineering (MIT-Pt/BS-BB/0082/2008); PhD grant from FCT (ref. SFRH/BD/22863/2005) to S.

    Enzymes Are Enriched in Bacterial Essential Genes

    Get PDF
    Essential genes, those indispensable for the survival of an organism, play a key role in the emerging field, synthetic biology. Characterization of functions encoded by essential genes not only has important practical implications, such as in identifying antibiotic drug targets, but can also enhance our understanding of basic biology, such as functions needed to support cellular life. Enzymes are critical for almost all cellular activities. However, essential genes have not been systematically examined from the aspect of enzymes and the chemical reactions that they catalyze. Here, by comprehensively analyzing essential genes in 14 bacterial genomes in which large-scale gene essentiality screens have been performed, we found that enzymes are enriched in essential genes. Essential enzymes have overrepresented ligases (especially those forming carbon-oxygen bonds and carbon-nitrogen bonds), nucleotidyltransferases and phosphotransferases, while have underrepresented oxidoreductases. Furthermore, essential enzymes tend to associate with more gene ontology domains. These results, from the aspect of chemical reactions, provide further insights into the understanding of functions needed to support natural cellular life, as well as synthetic cells, and provide additional parameters that can be integrated into gene essentiality prediction algorithms

    Identidad Ă©tnica y redes personales entre jĂłvenes de Sarajevo

    Get PDF
    After fieldwork conducted among young people in Sarajevo, we found a relation between the discourses sustained by them and the ethnic categories they use to classify people and to identify themselves. Also we have found that people self-affiliated as "Bosnians" play an important role in the network of multiethnic relationships, in which strong ties, surprisingly, are still very important. Finally we found a relationship between the composition of personal networks and the ethnic discourses that are maintained.Después de un trabajo de campo realizado con un grupo de jóvenes en Sarajevo, hemos constatado la existencia de una relación entre los discursos que sostienen y las categorías étnicas que utilizan tanto para clasificar a los demás como para auto-identificarse. Asimismo hemos encontrado que los jóvenes que se autodenominan "Bosnios" juegan un rol importante en la red de relaciones multiétnicas, en la que los lazos fuertes, sorprendentemente, son muy importantes. Finalmente hemos hallado una relación entre la composición de las redes personales y los discursos étnicos que se sostienen. Vivimos, o creemos vivir, en múltiples "comunidades", imaginadas o no. Al mismo tiempo, el individuo y no el lugar, la familia o el grupo, se sitúa en el centro de la vida social y de las comunicaciones (Cf. Wellman, 2001). En este contexto, inducido por el avance del capitalismo flexible (Castells, 1996), pensamos que para entender adecuadamente la identidad o identidades postuladas por los individuos es necesario estudiar las redes personales y su dinámica. Desde esta perspectiva no podemos hablar de "etnias" o "multietnicidad" sin más precisiones, pues son conceptos basados en una concepción esencialista y estática de la identidad individual. El concepto de "sociedad multiétnica" es utilizado de una manera engañosamente progresista y objetiva, pues lo que en realidad legitima es la existencia de diferencias esenciales entre personas, alejando en lugar de acercar. Sin embargo, somos plenamente conscientes que los discursos esencialistas de la identidad étnica son omnipresentes, con enormes efectos políticos e individuales. Que planteemos que la concepción esencialista de la identidad sea inapropiada desde un punto de vista académico, no significa que ésta no se utilice políticamente y por lo tanto tenga consecuencias formidables en las relaciones sociales. Precisamente el estudio de las redes personales nos permite situarnos en una perspectiva que no utiliza con pretensiones analíticas conceptos "folk", como son los de "etnia", "pueblo" o "nación", sino que los sitúa en el terreno de los discursos sustentados por los actores (y los estados y medios de comunicación) y nos permite contextualizarlos mediante conceptos etic, es decir, impuestos por los investigadores. Sólo así podemos superar las tautologías que abundan en los discursos étnicos

    Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate

    Get PDF
    Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles (“meta-features”) in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale

    Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes.</p> <p>Results</p> <p>We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality.</p> <p>Conclusion</p> <p>We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality.</p

    Genome-wide essential gene identification in Streptococcus sanguinis

    Get PDF
    A clear perception of gene essentiality in bacterial pathogens is pivotal for identifying drug targets to combat emergence of new pathogens and antibiotic-resistant bacteria, for synthetic biology, and for understanding the origins of life. We have constructed a comprehensive set of deletion mutants and systematically identified a clearly defined set of essential genes for Streptococcus sanguinis. Our results were confirmed by growing S. sanguinis in minimal medium and by double-knockout of paralogous or isozyme genes. Careful examination revealed that these essential genes were associated with only three basic categories of biological functions: maintenance of the cell envelope, energy production, and processing of genetic information. Our finding was subsequently validated in two other pathogenic streptococcal species, Streptococcus pneumoniae and Streptococcus mutans and in two other gram-positive pathogens, Bacillus subtilis and Staphylococcus aureus. Our analysis has thus led to a simplified model that permits reliable prediction of gene essentiality
    • …
    corecore